Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

virt_mshv_vtl: Proxy irr filtering #609

Merged
merged 13 commits into from
Jan 17, 2025

Conversation

kmehtaintel
Copy link
Contributor

@kmehtaintel kmehtaintel commented Jan 3, 2025

Implementation for issues: #554 #563
The filter is kept updated for all VPs (i.e. during SINT updates and HvCallRetargetDeviceInterrupt hypercall) and applied during the IRR bitmap collection i.e. ProcessorRunner::proxy_irr() is extended to apply proxy_irr_filter before returning final IRR bitmap.

@kmehtaintel kmehtaintel requested review from a team as code owners January 3, 2025 00:26
@@ -691,6 +693,17 @@ impl<'p, T: Backing> Processor for UhProcessor<'p, T> {
}

for vtl in [GuestVtl::Vtl1, GuestVtl::Vtl0] {
#[cfg(guest_arch = "x86_64")]
Copy link
Contributor

@smalis-msft smalis-msft Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of guest_arch cfgs throughout this PR, how many are actually needed for the code to compile on all platforms? We generally prefer to have as few of these as possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation is only for x86 and the invocation related to filtering happen to be in common path, hence had to add guest_arch_cfgs. Added those after compiling for both aarch64 and x64-cvm. Will take a look if I can further reduce it.

@@ -37,6 +37,8 @@ use super::UhVpInner;
use crate::GuestVsmState;
use crate::GuestVtl;
use crate::WakeReason;
#[cfg(guest_arch = "x86_64")]
use bitvec::prelude::*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: don't use glob imports.

Also if this is going to stay a guest_arch cfg'd import it can move into the cfg_if block above.

/// New instance for requested VP count
fn new(vp_count: u32) -> Self {
DeviceIrrFilter {
device_irr_filter: BitVec::repeat(false, 256).into_boxed_bitslice(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is going to always be a constant size should it be a BitArray instead?

Copy link
Contributor Author

@kmehtaintel kmehtaintel Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is only for x86 platform and yes, on that IRR bitmap will always be constant of 256 bits. I will change this to BitArray.

}

/// Mark the completion for `proxy_irr_filter` update for VP
fn clr_vp_proxy_irr_filter_update(&self, vp_index: u32) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: don't shorten words, just make it 'clear'

@@ -223,6 +275,9 @@ struct UhPartitionInner {
no_sidecar_hotplug: AtomicBool,
use_mmio_hypercalls: bool,
backing_shared: BackingShared,
#[inspect(skip)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like something we'd want wired up to inspect

/// For requester VP to issue `proxy_irr_filter` update to other VPs
#[cfg(guest_arch = "x86_64")]
fn request_proxy_irr_filter_update(&self, vtl: GuestVtl, device_vector: u8, req_vp_index: u32) {
tracing::info!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How frequently is this expected to happen? I feel like the tracing level should be debug or trace. Info is turned on by default, will this be too noisy?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applies throughout

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this. Though it's not going to be frequent (one time SINT MSR writes and Device retarget), but debug is the right level here. Will change.

// excluding the requester VP (requester itself take care of updating its filter)
device_irr[vtl].set_vps_proxy_irr_filter_update(req_vp_index);

// Wake all the VPs, once the VP wakeup, it will query if `proxy_irr_filter`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should drop the lock before waking

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. Will scope if only for write device vector and vps bitmap.

@@ -1609,6 +1721,8 @@ impl<'a> UhProtoPartition<'a> {
no_sidecar_hotplug: params.no_sidecar_hotplug.into(),
use_mmio_hypercalls: params.use_mmio_hypercalls,
backing_shared: BackingShared::new(isolation, BackingSharedParams { cvm_state })?,
#[cfg(guest_arch = "x86_64")]
device_irr_filter: RwLock::new(VtlArray::from_fn(|_| DeviceIrrFilter::new(vps_count))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we're only proxying IRRs for VTL 0, so this probably doesn't need to be a VtlArray, it can just be a single.

@@ -1854,10 +1855,25 @@ impl<'a, T: Backing> ProcessorRunner<'a, T> {
*r = irr.swap(0, Ordering::Relaxed);
}
}

// `proxy_irr`received from host is untrusted, only allow vectors that L2 expects
for (f, v) in self.run.as_ref().proxy_irr_filter.iter().zip(r.iter_mut()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this anymore? I thought the kernel does the filtering now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kernel filtering the vectors would definitely be the ideal thing to do and with that notion only I added proxy_irr_filter field in hcl_run page, so that kernel can later use if (if/when needed). My understanding is that even when kernel opt to do irr filtering it has to somehow know what the valid vectors are that a HW-isolated CVM partition is expecting, and that information is only available in user-mode i.e. OpenVMM app (SINT and retarget intercepts).
Is there a branch that I can refer too where such work is in-progress ? If that work is already in flight, then I agree we don't need this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, to be clear, I just meant this specific code path, which performs the filtering--some recent kernel changes added filtering in the kernel but did not update user mode to configure it. See microsoft/OHCL-Linux-Kernel@10f345a.

So the rest of the code, to configure the filtering, is still required.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the user mode part of this isn't merged yet. You probably want to rebase on #124, which I'll try to push to get merged today.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect! Thanks for quick response and tagging the corresponding kernel change.
I will just put a note above this change, something like : N.B This code will be duplicate and should be removed once the kernel IRR bitmap filtering are merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jstarks, I looked at the changes for #124 and can see a new field proxy_irr_blocked in hcl_run page. If my understanding is correct, then this field is of reverse polarity than what I defined i.e. bits for the vectors that needs to be blocked should be set here (and assuming that's how kernel must be using it before writing to proxy_irr) ?

If this is correct, then as part of this PR, once I rebase it to #124, then I should be initializing proxy_irr_blocked to all 1 and then unset for trusted vectors. Right ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. (We used reverse polarity to avoid a breaking change.)

#[cfg(guest_arch = "x86_64")]
if self.partition.isolation.is_hardware_isolated() {
// Complete any proxy filter update if required
self.partition.complete_vp_proxy_filter_update(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This takes a lock (a read lock, but still) for every VP entry. I think we need to avoid this. Can you instead add a wake reason for updating the filter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, that's a great suggestion! With this new wake reason, which is for a VP, I can entirely get rid of the proxy_irr_filter_update_vps bitmap (which was needed for request tracking) and combining that with @smalis-msft suggestion (as device retarget is only for VTL0) I think I can fully get rid of DeviceIrrFilter and have only device_irr_filter array in UhPartitionInner and move all helper methods to UhPartitionInner.

Please do share your thoughts on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

@jstarks jstarks changed the title Proxy irr filtering virt_mshv_vtl: Proxy irr filtering Jan 6, 2025
let run = MappedPage::new(fd, vp as i64).map_err(|e| Error::MmapVp(e, None))?;
let run: MappedPage<hcl_run> =
MappedPage::new(fd, vp as i64).map_err(|e| Error::MmapVp(e, None))?;
// SAFETY: Initializing `proxy_irr_blocked` to block all initially
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not how SAFETY comments work. SAFETY comments are intended to explain why the unsafe operation you're performing does not violate any of rust's safety rules. If you're interested I'd suggest reading chapter 1 of the nomicon (https://doc.rust-lang.org/nomicon/safe-unsafe-meaning.html). But for this case it's probably enough to look up other places where we touch the run page and mimic what they say.

@@ -1857,6 +1863,20 @@ impl<'a, T: Backing> ProcessorRunner<'a, T> {
}
}

/// Update the `proxy_irr_blocked` in run page
pub fn update_proxy_irr_filter(&mut self, irr_filter: &BitArray<[u32; 8], Lsb0>) {
// SAFETY: `proxy_irr_blocked` is accessed in both user and kernel, but from current VP
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest saying something like "SAFETY: proxy_irr_blocked is not accessed by any other VPs, so we know we have exclusive access"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is true. The kernel will access it in an interrupt context--only from the current VP yes, but we still need to use at least relaxed atomic reads/writes to access the values to satisfy the Rust and Linux kernel memory models.

Copy link
Contributor Author

@kmehtaintel kmehtaintel Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh that's right, thanks for catching this... OpenVMM has U/K model, so yes I agree, same VP running on user context can be interrupted and then kernel may access this field. Will move to atomics, like the way it has been done while accessing proxy_irr.


for irr_bit in irr_filter.iter_ones() {
tracing::debug!(irr_bit, "update_proxy_irr_filter");
proxy_irr_blocked[irr_bit >> 5] &= !(1 << (irr_bit & 0x1F));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get a comment explaining this math? Why are we shifting things?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler can be relied upon to strength reduce a constant power-of-two division/remainder to a shift/and. So I'd probably write this as proxy_irr_blocked[irr_bit / 32] &= !(1 << (irr_bit % 32)) to make this clearer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having said that, it seems like you can just directly store irr_filter.into_inner().map(|v| !v) into proxy_irr_blocked rather than do this iteration. That seems a lot simpler (maybe just a little more complicated once you switch to using atomics).

I'd also suggest just taking a &[u32; 8] as a parameter rather than adding BitArray to the public interface for this crate.


#[cfg(guest_arch = "x86_64")]
if wake_reasons.update_proxy_irr_filter()
&& self.partition.isolation.is_hardware_isolated()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this wake reason ever get set on non-isolated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we are only issuing this wake request from impl<T: CpuIo, B: HardwareIsolatedBacking> UhHypercallHandler<'_, '_, T, B>::retarget_physical_interrupt, so this will never be set on non-isolated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check that we're isolated in this if then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree its duplicate; we can get rid of it. Was just making it more explicit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can put an assertion if you want.

// If updated is Synic MSR, then check if its proxy or previous was proxy
// in either case, we need to update the `proxy_irr_blocked`
let mut irr_filter_update = false;
if matches!(msr, hvdef::HV_X64_MSR_SINT0..=hvdef::HV_X64_MSR_SINT15) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we change the order of these ifs, can we then merge the two checking hv?

@@ -1857,6 +1862,21 @@ impl<'a, T: Backing> ProcessorRunner<'a, T> {
}
}

/// Update the `proxy_irr_blocked` in run page
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you be more descriptive that irr_filter is the vectors you want to allow?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and also, that we don't respect what's currently set - this fully overwrites it?

Copy link
Contributor Author

@kmehtaintel kmehtaintel Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will add a comment that irr_filter is allowed vector bitmap (or may be name change?) and yes, this function overwrites the existing set, without respecting what is set, as the intention is the caller is going to provide the cumulative map. As you can see currently, before invoking this method, first we iterate over all sints and then we capture all device vectors (device_vector_table) and finally cumulative map is the one we pass as irr_filter argument. Also, device vector map i.e.. device_vector_table, is global at partition level and vectors are never removed from it. So, whenever there is a SINT update or device retarget VMCALL, all SINTs vector and device_vector_table is considered to populate a cumulative map, before invoking this method.

Copy link
Member

@chris-oo chris-oo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few minor nits but looks good to me otherwise


/// Get current partition global device irr vectors (VTL0 for now)
#[cfg(guest_arch = "x86_64")]
fn get_device_vectors(&self, _vtl: GuestVtl, irr_vectors: &mut IrrBitmap) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we usually don't use get in function names for rust. prefer something fill_device_vectors or something?

openhcl/virt_mshv_vtl/src/lib.rs Show resolved Hide resolved
fn update_proxy_irr_filter(&mut self, vtl: GuestVtl) {
let mut irr_bits: BitArray<[u32; 8], Lsb0> = BitArray::new(Default::default());

// Get all not maksed && proxy SINT vectors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Get all not maksed && proxy SINT vectors
// Get all not masked && proxy SINT vectors

@chris-oo chris-oo merged commit d1413d9 into microsoft:main Jan 17, 2025
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants